80 research outputs found

    Comparing Czech and English AMRs

    Get PDF
    This paper compares Czech and English annotation using Abstract Meaning Represantation formalism

    Extending an Event-type Ontology: Adding Verbs and Classes Using Fine-tuned LLMs Suggestions

    Full text link
    In this project, we have investigated the use of advanced machine learning methods, specifically fine-tuned large language models, for pre-annotating data for a lexical extension task, namely adding descriptive words (verbs) to an existing (but incomplete, as of yet) ontology of event types. Several research questions have been focused on, from the investigation of a possible heuristics to provide at least hints to annotators which verbs to include and which are outside the current version of the ontology, to the possible use of the automatic scores to help the annotators to be more efficient in finding a threshold for identifying verbs that cannot be assigned to any existing class and therefore they are to be used as seeds for a new class. We have also carefully examined the correlation of the automatic scores with the human annotation. While the correlation turned out to be strong, its influence on the annotation proper is modest due to its near linearity, even though the mere fact of such pre-annotation leads to relatively short annotation times.Comment: Accepted to LAW-XVII @ ACL 202

    Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation

    Get PDF
    We present a system for verbal Word Sense Disambiguation (WSD) that is able to exploit additional information from parallel texts and lexicons. It is an extension of our previous WSD method, which gave promising results but used only monolingual features. In the follow-up work described here, we have explored two additional ideas: using English-Czech bilingual resources (as features only - the task itself remains a monolingual WSD task), and using a 'hybrid' approach, adding features extracted both from a parallel corpus and from manually aligned bilingual valency lexicon entries, which contain subcategorization information. Albeit not all types of features proved useful, both ideas and additions have led to significant improvements for both languages explored

    Machine Translation of Medical Texts in the Khresmoi Project

    Get PDF
    The WMT 2014 Medical Translation Task poses an interesting challenge for Machine Translation (MT). In the standard translation task, the end application is the translation itself. In this task, the MT system is considered a part of a larger system for cross-lingual information retrieval (IR)

    Adaptation of machine translation for multilingual information retrieval in the medical domain

    Get PDF
    Objective. We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve eectiveness of cross-lingual IR. Methods and Data. Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech–English, German–English, and French–English. MT quality is evaluated on data sets created within the Khresmoi project and IR eectiveness is tested on the CLEF eHealth 2013 data sets. Results. The search query translation results achieved in our experiments are outstanding – our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech–English, from 23.03 to 40.82 for German–English, and from 32.67 to 40.82 for French–English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French–English. For Czech–English and German–English, the increased MT quality does not lead to better IR results. Conclusions. Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance – better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions

    CoNLL 2017 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies

    Get PDF
    The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.Peer reviewe

    Khresmoi Professional: Multilingual Semantic Search for Medical Professionals

    Get PDF
    There is increasing interest in and need for innovative solutions to medical search. In this paper we present the EU funded Khresmoi medical search and access system, currently in year 3 of 4 of development across 12 partners . The Khresmoi system uses a component based architecture housed in the cloud to allow for the development of several innovative applications to support target users medical information needs. The Khresmoi search systems based on this architecture have been designed to support the multilingual and multimod al information needs of three target groups the general public, general practitioners and consultant radiologists. In this paper we focus on the presentation of the systems to support the latter two groups using semantic, multilingual text and image based (including 2D and 3D radiology images) search

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    Valency of Verbs in the Prague Dependency Treebank

    No full text
    Title: Valency of verbs in the Prague Dependency Treebank Author: PhDr. Zdeňka Urešová Department: Institute of Formal and Applied Linguistics MFF UK Supervisor: Prof. PhDr. Eva Hajičová, DrSc. Abstract: This dissertation describes PDT-Vallex, a valency lexicon of Czech verbs, and its relation to the annotation of the Prague Dependency Treebank (PDT). The PDT-Vallex lexicon was created during the an- notation of the PDT and it is a valuable source of verbal valency information available both for linguistic research and for computer- ized natural language processing. In this thesis, we describe not only the structure and design of the lexicon (which is closely related to the notion of valency as developed in the Functional Generative De- scription of language) but also the relation between the PDT-Vallex and the PDT. The explicit and full-coverage linking of the lexicon to the treebank prompted us to pay special attention to diatheses; we propose formal transformation rules for diatheses to handle their surface realization even when the canonical forms of verb arguments as captured in the lexicon do not correspond to the forms of these arguments actually appearing in the corpus.Název práce: Valence sloves v Pražském závislostním korpusu Autor: PhDr. Zdeňka Urešová Katedra/Ústav: Ústav formální a aplikované lingvistiky MFF UK Vedoucí práce: Prof. PhDr. Eva Hajičová, DrSc. Abstrakt: Tato disertační práce popisuje valenci sloves v rámci anotace Praž- ského závislostního korpusu (PDT) a jejím hlavním cílem je popsat valenční slovník PDT-Vallex. Tento slovník vznikl při anotaci PDT a díky svému charakteru se stal významným zdrojem valenční informace využitelné jak pro lingvistický výzkum, tak pro počítačové zpracování přirozeného jazyka. V práci popisujeme nejen koncepci slovníku, která úzce souvisí s pojetím valence v rámci Funkčně generativního popisu, ale i vztah slovníku k PDT. Právě na základě tohoto vztahu - úzkého propojení slovníku s korpusem - věnujeme zvláštní pozornost popisu formálních prostředk· diatezí. Navrhujeme transformační pravidla pro sekundární diateze, s jejichž pomocí se dokážeme vyrovnat s případy, kdy formy slovesných valenčních doplnění ve slovníku neodpovídají formám slovesných doplnění v korpusových textech.Ústav formální a aplikované lingvistikyInstitute of Formal and Applied LinguisticsFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult
    corecore